Performance of computational algorithms to deconvolve heterogeneous bulk ovarian tumor tissue depends on experimental factors

Background Single-cell gene expression profiling provides unique opportunities to understand tumor heterogeneity and the tumor microenvironment. Because of cost and feasibility, profiling bulk tumors remains the primary population-scale analytical strategy. Many algorithms can deconvolve these tumors using single-cell profiles to infer their composition. While experimental choices do not change the true underlying composition of the tumor, they can affect the measurements produced by the assay. Results We generated a dataset of high-grade serous ovarian tumors with paired expression profiles from using multiple strategies to examine the extent to which experimental factors impact the results of downstream tumor deconvolution methods. We find that pooling samples for single-cell sequencing and subsequent demultiplexing has a minimal effect. We identify dissociation-induced differences that affect cell composition, leading to changes that may compromise the assumptions underlying some deconvolution algorithms. We also observe differences across mRNA enrichment methods that introduce additional discrepancies between the two data types. We also find that experimental factors change cell composition estimates and that the impact differs by method. Conclusions Previous benchmarks of deconvolution methods have largely ignored experimental factors. We find that methods vary in their robustness to experimental factors. We provide recommendations for methods developers seeking to produce the next generation of deconvolution approaches and for scientists designing experiments using deconvolution to study tumor heterogeneity. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03077-7.

Batch B, 80% Threshold D Fig. S1.Relaxed probability thresholds for hash demultiplexing increase number of assigned cells.A) Assignments for Batch A where any cell with greater than 85% probability of originating from a sample is assigned to that sample.B) Assignments for Batch B at the 85% probability threshold.C) Assignments for Batch A at a threshold of greater than 80% probability of originating from a sample.D) Assignments for Batch B at the 80% probability threshold.

of 8
Hippen, Omran, Weber, Jung, Drapkin, Doherty, Hicks, and Greene .Genetic demultiplexing is concordant across source of bulk reference genotypes.A) Confusion matrix of genetic demultiplexing assignments for Batch A when using reference genotypes from rRNA -Chunk samples vs rRNA -Dissociated samples.B) Genetic demultiplexing assignments for Batch B using reference genotypes from rRNA -Chunk samples vs rRNA -Dissociated samples.C) Confusion matrix of genetic demultiplexing assignments for Batch A when using reference genotypes from rRNA - Dissociated samples vs polyA + Dissociated samples.D) Genetic demultiplexing assignments for Batch B using reference genotypes from rRNA -Dissociated samples vs polyA + Dissociated samples.The variance of deconvolution proportion estimates, stratified by cell type and method, when using our default reference profile (genetic), the reference profile of cells assigned to a sample by hash demultiplexing (hashing), and a simulated sample of approximately 2000 cells (Sim2000).B) Variance of proportion estimates across the same reference profiles as in A but also including results from Sim1000.C) Same as B but also including results from Sim500.D) Same as C but also including results from Sim200.
Fig. S2.Hash demultiplexing demonstrates cell type bias.A) Proportion of cell types in Batch A across all cells and in unassigned cells at various probability thresholds.Epithelial cells and fibroblasts are proportionally greater and T cells proportionally lesser in unassigned cells than in all cells.B) Proportion of cell types in Batch B.

Fig. S4 .(
Fig. S4.Stromal cell types are more abundant in dissociated bulk samples.Results from Gene Set Enrichment Analysis of rRNA -Chunk samples vs rRNA -Dissociated samples.Gene signatures associated with endothelial cells, fibroblasts, macrophages, and other immune cells (blue) are more abundant in rRNA -Dissociated samples, whereas red blood cell gene signatures (orange) are more abundant in rRNA -Chunk samples.

Fig. S6 .
Fig. S6.Robustness to very small reference profilesA) The variance of deconvolution proportion estimates, stratified by cell type and method, when using our default reference profile (genetic), the reference profile of cells assigned to a sample by hash demultiplexing (hashing), and a simulated sample of approximately 2000 cells (Sim2000).B) Variance of proportion estimates across the same reference profiles as in A but also including results from Sim1000.C) Same as B but also including results from Sim500.D) Same as C but also including results from Sim200.
Fig. S7.Alternate deconvolution methods that return cell type scores do not match single cell proportions.A-G) Correlation between the cell type score returned by the deconvolution method and the corresponding proportion of cells in the scRNA-seq Individual sample.The name of the deconvolution method and the Pearson correlation (r value) is shown at the top of each panel.